64 research outputs found

    In-Network Data Reduction Approach Based On Smart Sensing

    Get PDF
    The rapid advances in wireless communication and sensor technologies facilitate the development of viable mobile-Health applications that boost opportunity for ubiquitous real- time healthcare monitoring without constraining patients' activities. However, remote healthcare monitoring requires continuous sensing for different analog signals which results in generating large volumes of data that needs to be processed, recorded, and transmitted. Thus, developing efficient in-network data reduction techniques is substantial in such applications. In this paper, we propose an in-network approach for data reduction, which is based on fuzzy formal concept analysis. The goal is to reduce the amount of data that is transmitted, by keeping the minimal-representative data for each class of patients. Using such an approach, the sender can effectively reconfigure its transmission settings by varying the target precision level while maintaining the required application classification accuracy. Our results show the excellent performance of the proposed scheme in terms of data reduction gain and classification accuracy, and the advantages that it exhibits with respect to state-of-the-art techniques.Scopu

    Preface

    Get PDF
    In recent years the involvement of Information Technology in business, governments, and education has increased dramatically. More and more research works have been conducted in different areas of Information Technology such as Artificial Intelligence, Database Managements, Algorithms, Web Technologies, Computer Graphics, Networks, etc. In recognizing the importance and major advances, Information Technology has been chosen to be the theme of this special issue of the Information Science Journal

    Named Entity Disambiguation using Hierarchical Text Categorization

    Get PDF
    Named entity extraction is an important step in natural language processing. It aims at finding the entities which are present in text such as organizations, places or persons. Named entities extraction is of a paramount importance when it comes to automatic translation as different named entities are translated differently. Named entities are also very useful for advanced search engines which aim at searching for a detailed information regarding a specific entity. Named entity extraction is a difficult problem as it usually requires a disambiguation step as the same word might belong to different named entities depending on the context. This work has been conducted on the ANERCorp named entities database. This Arabic database contains four different named entities: person, organization, location and miscellaneous. The database contains 6099 sentences, out of which 60% are used for training 20% for validation and 20% for testing. Our method for named entity extraction contains two main steps: the first step predicts the list of named entities which are present at the sentence level. The second step predicts the named entity of each word of the sentence. The prediction of the list of named entities at the sentence level is done through separating the document into sentences using punctuation marks. Subsequently, a binary relation between the set of sentences (x) and the set of words (y) is created from the obtained list of sentences. A relation exists between the sentence (x) and the word (y) if, and only if, (x) contains (y). A binary relation is created for each category of named entities (person, organization, location and miscellaneous). If a sentence contains several named entities, it is duplicated in the relation corresponding to each one of them. Our method then extracts keywords from the obtained binary relations using the hyper concept method [1]. This method decomposes the original relation into non-overlapping rectangles and highlights for each rectangle the most representative keyword. The output is a list of keywords sorted in a hierarchical ordering of importance. The obtained keyword list associated with each category of named entities are fed into a random forest classifier of 10000 random trees in order to predict the list of named entities associated with each sentence. The random forest classifier produces for each sentence the list of probabilities corresponding to the existence of each category of named entities within the sentence. Random Forest [sentence(i)] = (P(Person),P(Organization),P(Location),P(miscellaneous)). Subsequently, the sentence is associated with the named entities for which the corresponding probability is larger than a threshold set empirically on the validation set. In the second step, we create a lookup table associating to each word in the database, the list of named entities to which it corresponds in the training set. For unseen sentences of the test set, the list of named entities predicted at the sentence level is produced, and for each word, the list of predicted named entities is also produced using the lookup table previously built. Ultimately, for each word, the intersection between the two predicted lists of named entities (at the sentence and the word level) will give the final predicted named entity. In the case where more than one named entity is produced at this stage, the one with the maximum probability is kept. We obtained an accuracy of 76.58% when only considering lookup tables of named entities produced at the word level. When performing the intersection with the list produced at the sentence level the accuracy reaches 77.96%. In conclusion, the hierarchical named entity extraction leads to improved results over direct extraction. Future work includes the use of other linguist features and larger lookup table in order to improve the results. Validation on other state of the art databases is also considered. Acknowledgements This contribution was made possible by NPRP grant #06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.qscienc

    Sentiment Analysis in Comments Associated to News Articles: Application to Al Jazeera Comments

    Get PDF
    Sentiment analysis is a very important research task that aims at understanding the general sentiment of a specific community or group of people. Sentiment analysis of Arabic content is still in its early development stages. In the scope of Islamic content mining, sentiment analysis helps understanding what topics Muslims around the world are discussing, which topics are trending and also which topics will be trending in the future. This study has been conducted on a dataset of 5000 comments on news articles collected from Al Jazeera Arabic website. All articles were about the recent war against the Islamic State. The database has been annotated using Crowdflower which is website for crowdsourcing annotations of datasets. Users manually selected whether the sentiment associated with the comment was positive or negative or neutral. Each comment has been annotated by four different users and each annotation is associated with a confidence level between 0 and 1. The confidence level corresponds to whether the users who annotated the same comment agreed or not (1 corresponds to full agreement between the four annotators and 0 to full disagreement). Our method represents the corpus by a binary relation between the set of comments (x) and the set of words (y). A relation exists between the comment (x) and the word (y) if, and only if, (x) contains (y). Three binary relations are created for comments associated with positive, negative and neutral sentiments. Our method then extracts keywords from the obtained binary relations using the hyper concept method [1]. This method decomposes the original relation into non-overlapping rectangles and highlights for each rectangle the most representative keyword. The output is a list of keywords sorted in a hierarchical ordering of importance. The obtained keyword list associated with positive, negative and neutral comments are fed into a random forest classifier of 1000 random trees in order to predict the sentiment associated with each comment of the test set. Experiments have been conducted after splitting the database into 70% training and 30% testing subsets. Our method achieves a correct classification rate of 71% when considering annotations with all values of confidence and even 89% when only considering the annotation with a confidence value equal to 1. These results are very promising and testify of the relevance of the extracted keywords. In conclusion, the hyper concept method extracts discriminative keywords which are used in order to successfully distinguish between comments containing positive, negative and neutral sentiments. Future work includes performing further experiments by using a varying threshold level for the confidence value. Moreover, by applying a part of speech tagger, it is planned to perform keyword extraction on words corresponding to specific grammatical roles (adjectives, verbs, nouns… etc.). Finally, it is also planned to test this method on publicly available datasets such as the Rotten Tomatoes Movie Reviews dataset [2]. Acknowledgment This contribution was made possible by NPRP grant #06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.qscienc

    Edge-based Compression and Classification for Smart Healthcare Systems: Concept, Implementation and Evaluation

    Get PDF
    Smart healthcare systems require recording, transmitting and processing large volumes of multimodal medical data generated from different types of sensors and medical devices, which is challenging and may turn some of the remote health monitoring applications impractical. Moving computational intelligence to the net- work edge is a promising approach for providing efficient and convenient ways for continuous-remote monitoring. Implementing efficient edge-based classification and data reduction techniques are of paramount importance to enable smart health- care systems with efficient real-time and cost-effective remote monitoring. Thus, we present our vision of leveraging edge computing to monitor, process, and make au- tonomous decisions for smart health applications. In particular, we present and im- plement an accurate and lightweight classification mechanism that, leveraging some time-domain features extracted from the vital signs, allows for a reliable seizures detection at the network edge with precise classification accuracy and low com- putational requirement. We then propose and implement a selective data transfer scheme, which opts for the most convenient way for data transmission depending on the detected patient’s conditions. In addition to that, we propose a reliable energy-efficient emergency notification system for epileptic seizure detection, based on conceptual learning and fuzzy classification. Our experimental results assess the performance of the proposed system in terms of data reduction, classification accuracy, battery lifetime, and transmission delay. We show the effectiveness of our system and its ability to outperform conventional remote monitoring systems that ignore data processing at the edge by: (i) achieving 98.3% classification accuracy for seizures detection, (ii) extending battery lifetime by 60%, and (iii) decreasing average transmission delay by 90%

    Breast cancer image classification using pattern-based Hyper Conceptual Sampling method

    Get PDF
    The increase in biomedical data has given rise to the need for developing data sampling techniques. With the emergence of big data and the rise of popularity of data science, sampling or reduction techniques have been assistive to significantly hasten the data analytics process. Intuitively, without sampling techniques, it would be difficult to efficiently extract useful patterns from a large dataset. However, by using sampling techniques, data analysis can effectively be performed on huge datasets, to produce a relatively small portion of data, which extracts the most representative objects from the original dataset. However, to reach effective conclusions and predictions, the samples should preserve the data behavior. In this paper, we propose a unique data sampling technique which exploits the notion of formal concept analysis. Machine learning experiments are performed on the resulting sample to evaluate quality, and the performance of our method is compared with another sampling technique proposed in the literature. The results demonstrate the effectiveness and competitiveness of the proposed approach in terms of sample size and quality, as determined by accuracy and the F1-measure. 2018This contribution was made possible by NPRP-07-794-1-145 grant from the Qatar National Research Fund (a member of Qatar foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Conceptual data sampling for breast cancer histology image classification

    Get PDF
    Data analytics have become increasingly complicated as the amount of data has increased. One technique that is used to enable data analytics in large datasets is data sampling, in which a portion of the data is selected to preserve the data characteristics for use in data analytics. In this paper, we introduce a novel data sampling technique that is rooted in formal concept analysis theory. This technique is used to create samples reliant on the data distribution across a set of binary patterns. The proposed sampling technique is applied in classifying the regions of breast cancer histology images as malignant or benign. The performance of our method is compared to other classical sampling methods. The results indicate that our method is efficient and generates an illustrative sample of small size. It is also competing with other sampling methods in terms of sample size and sample quality represented in classification accuracy and F1 measure
    • …
    corecore